Introduction

The following is intended as a set of tips for people learning how to use Git and GitHub.

There are many excellent guides to Git and GitHub online, e.g.,

And most relevantly the OpenSAFELY documentation here.

These tips are meant to supplement them.

Tips

Intro to Git

  • Git was written to allow developers work on the source code of the Linux kernel
    • One kernel release they got in a terrible mess
    • This provoked Linus Torvalds into action
    • For an excellent insight into his thinking watch this talk he gave at Google here
    • Git was designed to work with text files
    • (Especially if used at the command line) Git can be intimidating to use and we can get Git errors (which like LaTeX and R errors can be quite cryptic)
  • A Git repository is a folder/directory on your computer which has been Git initialised
    • Using either the command line

      git init mynewfolder
    • Or GitHub Desktop

    • Repos on GitHub are already Git initialised

      • When you clone them down to your computer they work in GitHub Desktop
  • Git is commonly referred to as version control software
  • Git is better described as a content addressable filesystem which translates to Git tracks the contents of the files in your repo
    • Git takes snapshots of your files - when you tell it to - commits

    • Commits are identified by the SHA-1 hash of the contents of your files at that time

    • Git knows the state of your files at every commit

      • Can easily restore files to a previous state
    • For Git the state of your files only changes when their contents change

      • If you reopen a file, make no changes, then resave it, Git will show no changes
      • If you add an empty folder/directory to your repo Git will detect no changes in your repo
      • This differs to OneDrive/SharePoint/Google Drive which are file synchronisation systems
    • I recommend to not place your Git repos in a location that is sync’d by either OneDrive or Google Drive (they are very different syncing technologies to Git)

The .git folder

  • When you initialise a directory the .git folder is created
  • This contains all of the files Git uses to track the contents of your files
  • Here is the .git folder of a repo on my computer (I have selected to View hidden files in Windows Explorer)
  • Confusingly GitHub hides the .git folder from view
  • Here are its contents - never edit these manually
  • Explanation of these is (from here)

Common Git commands

  • I recommend you use GitHub Desktop instead of these commands
  • These commands are what GitHub Desktop is using behind the scenes
  • Git is the name of the program, git is the name of the executable available at your command line
git init 
git add <filename>
git status
git commit -m "Your commit message"
git commit --amend -m "Your amended commit message"
git push 
git pull 
git clone
git branch
git checkout
git merge
git fetch 

Installing Git and GitHub Desktop

Installing Git

  • Windows
    • Download and install from here
  • macOS comes with an out-dated version of Git
    • I recommend installing the Homebrew version

    • First install Homebrew, see instructions here

    • Then run in your Terminal app

      brew upgrade
      brew install git
    • Additionally on a Mac it is helpful to install Xcode command line tools (i.e., avoid installing the whole of Xcode.)

      xcode-select --install
      • Must reinstall these everytime upgrade operating system versions, e.g., from Big Sur to Monterey
  • Once Git is installed its executable (called git) should be available at your command line
    • Check which version you have with (you want something recent-ish)

      git --version
    • On my Windows machine I have

      git version 2.33.1.windows.1

Installing GitHub Desktop

  • You could use Git through its command syntax however I recommend you use a graphical git editor
  • For Windows and macOS download and install GitHub Desktop from here
  • A Linux version of GitHub Desktop is available from here

Intro to GitHub

  • GitHub is a Git web server, there are others e.g., GitLab

  • Your repositories will be stored on GitHub, and you will clone them to your machine to work on them (or work on them in Gitpod)

  • Under your user account you see the repos you are owner of

  • On GitHub OpenSAFELY is an organization

    • The repos are owned by the organization so they show up under the organisation here

GitHub PAT for R

  • To create a GitHub Personal Access Token (PAT) to be allowed more downloads from GitHub per hour run in R
install.packages("usethis")
library(usethis)
create_github_token()

GitHub CLI

  • GitHub CLI stands for command line interface for operating GitHub
  • Installation instructions are here
  • But I don’t recommend using this

Git and GitHub Workflow

Standard GitHub workflow

  • (I recommend to only fork a public repo if you intend to send a pull request to it)
  • Fork the other person’s repo (this will be known as the upstream repo from your fork)
  • This creates a copy of their repo under your account (your fork)
  • Clone your fork (the copy under your account) to your machine
  • Create a new branch (do not work on master/main)
  • Make your changes and commit them
  • Push your new branch upto your GitHub (i.e., to your fork)
  • Create a pull request (from your new branch) back to the default (master/main) branch of the original repo

Workflow with an OpenSAFELY GitHub repo

  • Skip the forking step from the standard GitHub workflow
  • The repo on GitHub is known as origin
  • Clone the repo to your local machine
    • Click: Code | Open with GitHub Desktop
    • Click Clone in the box which appears in GitHub Desktop
    • In GitHub Desktop (i.e. locally) make a new branch
  • Do some work
    • Make some changes (to your project.yaml/study_definition.py/R scripts)
    • In GitHub Desktop select relevant changed lines and make small-ish commits with sensible commit messages
    • Do not commit changes to many files with a single commit message such as “Edits”!
  • Push your new branch from your local machine up to GitHub
  • Make a pull request from your branch to the default branch

Making a pull request

  • Let’s start by creating a new branch
  • We do some work and make a new commit which adds the new file to the repo
  • Next publish the new branch to GitHub
  • Now initiate the creation of the PR by either clicking in GitHub Desktop “Create Pull Request”
  • or clicking on the button on the repo webpage “Compare & pull request”
  • Edit the title box, add some extra text in the comment box, select a reviewer, and then click “Create pull request”
  • You can amend/edit pull requests by modifying/adding commits to the branch from which you sent the PR
  • See more about pull request reviews here
  • Merge PR
  • Confirm the merge
  • (Optional) Delete the branch the PR came from
  • The PR is now finished and we can see the merge commit in the default (main/master) branch

Common errors

Forgetting to pull down the latest changes from GitHub

  • (Especially in the morning) It is very easy to forget to pull down latest changes when reopening a project
  • Let’s say I or a colleague made changes and those are pushed to GitHub
    • The next day I restart work on a different computer, GitHub Desktop will show for example
  • But you forget to click “Pull origin”
  • If you make commits onto a branch on which there are not yet pulled commits on GitHub you’ll get a merge error when you eventually click “Pull origin”
  • You could resolve conflict e.g., in VSCode
  • We can see this can happen when we see both up and down arrows in Pull origin box (but not always)
  • Fix
    • Move your changes to a new branch

    • Move back to master/main and undo the changes there, then edit the files so they show no changes

    • Pull down the changes from GitHub

    • Merge changes from your new branch into the main/master/relevant branch

Merge conflict

See

  • About merge conflicts here
  • Resolving a merge conflict here

OpenSAFELY repositories

  • OpenSAFELY is a system of Python packages which run various Docker containers
    • The main GitHub organisation page is here
    • All the core code is published in their opensafely-core organisation on GitHub here
    • And there is also their opensafely-actions organisation here
  • A Docker container is a like a virtual machine
    • It defines the operating system and programs running within it
    • On my Windows 10 machine I can run an Ubuntu docker container
    • Just because an R package is installed in the R installation on your machine does not mean that it is installed in the OpenSAFELY R Docker container

Getting started

  • See OS page here
  • If creating a new repo create from the OS template here
  • This is already Git initialized

Running jobs (on the dummy data)

  • In your OS repo online
    • Use Gitpod
  • On your own machine - install the following free software
    • (If on Windows - Windows Subsystem for Linux version 2)
    • Docker Desktop
    • Python
    • Git
    • GitHub Desktop
    • VSCode text editor

Additional topics

Writing good commit messages

  • Follow the standard recommendations about making commit messages, see

Files for Git to ignore

  • You should not commit all files in the folder on your computer into your repo
  • The .gitignore file is a list of files and folders in your repo for Git to ignore
  • Common files to ignore are
    • .Rhistory
    • .DS_Store

GitHub repos contain more than just code

  • A repo for an R package will probably contain
    • The code for the R package
    • The code for its website (often made with pkgdown and hosted with GitHub Pages or Netlify)
    • Scripts for controlling continuous integration services such as GitHub Actions